Semantic Understanding of Scenes through the ADE20K Dataset
نویسندگان
چکیده
Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision. Despite the community’s efforts in data collection, there are still few image datasets covering a wide range of scenes and object categories with dense and detailed annotations for scene parsing. In this paper, we introduce and analyze the ADE20K dataset, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. A generic network design called Cascade Segmentation Module is then proposed to enable the segmentation networks to parse a scene into stuff, objects, and object parts in a cascade. We evaluate the proposed module integrated within two existing semantic segmentation networks, yielding significant improvements for scene parsing. We further show that the scene parsing networks trained on ADE20K can be applied to a wide variety of scenes and objects1.
منابع مشابه
Scene Parsing with Global Context Embedding Supplementary Materials
In Figure 1, we show the visualization in the feature space of the trained global context features. We sample 1000 images from the MIT ADE20k dataset [7] and pass these images through our proposed global context network to extract 4096dimensional context feature vectors. We use the t-SNE [4] algorithm to reduce the dimension of the extracted feature vectors from 4096 to 2 for better visualizati...
متن کاملContext Encoding for Semantic Segmentation
Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures ...
متن کاملSemantic Segmentation via Highly Fused Convolutional Network with Multiple Soft Cost Functions
Semantic image segmentation is one of the most challenged tasks in computer vision. In this paper, we propose a highly fused convolutional network, which consists of three parts: feature downsampling, combined feature upsampling and multiple predictions. We adopt a strategy of multiple steps of upsampling and combined feature maps in pooling layers with its corresponding unpooling layers. Then ...
متن کاملThe Cityscapes Dataset
Semantic understanding of urban street scenes through visual perception has been widely studied due to many possible practical applications. Key challenges arise from the high visual complexity of such scenes. In this paper, we present ongoing work on a new large-scale dataset for (1) assessing the performance of vision algorithms for different tasks of semantic urban scene understanding, inclu...
متن کاملTranslation and Hybridity in Scenes and Frames Semantics
The present study is a theoretical attempt to illustrate how Fillmore's Scenes and Frames Semantics (SFS) could be employed as a framework to portray the process of understanding and translating hybrid texts. It first reviews the origin of SFS; then it maps SFS onto Nida’s linguistic model of translation process and the Interpretive Theory of Translation; it examines in the next section, withi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1608.05442 شماره
صفحات -
تاریخ انتشار 2016